Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications
نویسندگان
چکیده
We develop a model for the parallel performance of algorithms that consist of concurrent, twodimensional wavefronts implemented in a message passing environment. The model, based on a LogGP machine parametrization, combines the separate contributions of computation and communication wavefronts. We validate the model on three important supercomputer systems, on up to 500 processors. We use data from a deterministic particle transport application taken from the ASCI workload, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. We also use the validated model to make estimates of performance and scalability of wavefront algorithms on 100-TFLOPS computer systems expected to be in existence within the next decade as part of the ASCI program and elsewhere. In this context, we analyze two problem sizes. Our model shows that on the largest such problem (1 billion cells), inter-processor communication performance is not the bottleneck. Single-node efficiency is the dominant factor.
منابع مشابه
Scalability Analysis of Multidimensional Wavefront Algorithms on Large-Scale SMP Clusters
We develop a model for the parallel performance of algorithms that consist of concurrent, twodimensional wavefronts implemented in a message passing environment. The model combines the separate contributions of computation and communication wavefronts. We validate the model on three supercomputer systems, with up to 500 processors, using data from an ASCI deterministic particle transport applic...
متن کاملOn the Acceleration of Wavefront Applications using Distributed Many-Core Architectures
In this paper we investigate the use of distributed GPU-based architectures to accelerate pipelined wavefront applications – a ubiquitous class of parallel algorithm used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algori...
متن کاملScientific Application Performance on Leading Scalar and Vector Supercomputing Platforms
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end computing (HEC) platforms, primarily because of their generality, scalability, and cost effectiveness. However, the growing gap between sustained and peak performance for full-scale scientific applications on conventional supercomputers has become a major concern in high performance ...
متن کاملScientific Application Performance On Leading Scalar and Vector Supercomputering Platforms
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build highend computing (HEC) platforms, primarily because of their generality, scalability, and cost effectiveness. However, the growing gap between sustained and peak performance for full-scale scientific applications on conventional supercomputers has become a major concern in high performance c...
متن کاملCache-oblivious wavefront algorithms for dynamic programming problems: efficient scheduling with optimal cache performance and high parallelism
Wavefront algorithms are algorithms on grids where execution proceeds in a wavefront manner from the start to the end of the execution (execution moves through the grid as if a wavefront is moving). Many dynamic programming problems and stencil computations are wavefront algorithms. Iterative wavefront algorithms for evaluating dynamic programming (DP) recurrences exploit optimal parallelism, b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJHPCA
دوره 14 شماره
صفحات -
تاریخ انتشار 2000